Models trained via empirical risk minimization (ERM) are known to rely on spurious correlations between labels and task-independent input features, resulting in poor generalization to distributional shifts. Group distributionally robust optimization (G-DRO) can alleviate this problem by minimizing the worst-case loss over a set of pre-defined groups over training data. G-DRO successfully improves performance of the worst-group, where the correlation does not hold. However, G-DRO assumes that the spurious correlations and associated worst groups are known in advance, making it challenging to apply it to new tasks with potentially multiple unknown spurious correlations. We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples which maximizes worst-case loss over the discovered groups. On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches used with G-DRO. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO -- datasets where potential spurious correlations are as yet uncharacterized. Human evaluation of ARGO groups shows that they contain well-defined, yet previously unstudied spurious correlations that lead to model errors.
translated by 谷歌翻译
最近的证据指出了高性能跨度预测模型的脆弱的动机,我们将注意力指向多种选择阅读理解。特别是,这项工作介绍了一种新的方法,用于通过重量全球正常化改进答案选择,通过对文档的一部分的预测的加权全球化。我们表明,将我们的方法应用于适用于答案选择的跨度预测模型,有助于从叙述问题的长摘要进行模型性能,这是一个充满挑战的阅读理解数据集,具有答案选择任务,我们强烈提高任务基线性能+36.2平均互酷等级。
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Virtual Product placement(VPP) is the advertising technique of digitally placing a branded object into the scene of a movie or TV show. This type of advertising provides the ability for brands to reach consumers without interrupting the viewing experience with a commercial break, as the products are seen in the background or as props. Despite this being a billion-dollar industry, ad rendering technique is currently executed at post production stage, manually either with the help of VFx artists or through semi-automated solutions. In this paper, we demonstrate a fully automated framework to digitally place 2-D ads in linear TV cooking shows captured using single-view camera with small camera movements. Without access to full video or production camera configuration, this framework performs the following tasks (i) identifying empty space for 2-D ad placement (ii) kitchen scene understanding (iii) occlusion handling (iv) ambient lighting and (v) ad tracking.
translated by 谷歌翻译
当前的口语对话系统在长时间的沉默(700-1000ms)之后开始转弯,这导致了几乎没有实时反馈,缓慢的反应和整体刻板的对话流。人类通常在200ms之内做出反应,并成功预测提前的起始点将使口语对话代理也能够做到这一点。在这项工作中,我们预测使用预先训练的语音表示模型(WAV2VEC 1.0)的韵律功能在用户音频和从预先训练的语言模型(GPT-2)上运行的单词功能(wav2Vec 1.0)的启动时间进行预测。。为了评估错误,我们提出了两个指标W.R.T.预测和真实的交货时间。我们训练和评估了总结板语料库上的模型,发现我们的方法的表现优于指标的先前工作,并且大大优于等待700ms沉默的常见方法。
translated by 谷歌翻译
我们提出了一个开放域的社交聊天机器人Chirpy Cardinal。为了既有信息又有信息,我们的机器人以一种真实的,情感上的方式与用户聊天。通过将受控的神经产生与脚手架,手写的对话整合在一起,我们让用户和机器人都轮流推动对话,从而产生引人入胜且流利的体验。Chirpy Cardinal部署在Alexa奖Socialbot Grand Challenge的第四次迭代中,每天处理数千次对话,在9个机器人中排名第二,平均用户评级为3.58/5。
translated by 谷歌翻译
目前的工作旨在研究分层时间记忆(HTM)理论的性能,以便自动分类文本以及文档。HTM是一种基于人类新科的工作原理的生物启发理论。目前的研究打算在HTM理论中使用空间池学习算法提供文档分类的替代框架。由于HTM仅接受一个二进制数据流作为输入,所以潜在语义索引(LSI)技术用于从输入中提取顶部特征并将其转换为二进制格式。空间池算法将二进制输入转换为具有类似的输入文本的稀疏模式,其空间模式具有重叠的空间模式,使得将模式分类为类别。获得的结果证明了HTM理论虽然是其新生阶段,但与大多数基于流行的机器学习的分类器进行表现。
translated by 谷歌翻译
最近的开放式域问题的作品应答使用检索器模型引用外部知识库,可选地重新映射与单独的重新编制模型,并使用另一个读取器模型生成答案。尽管执行相关任务,但模型具有单独的参数,并且在训练期间略微耦合。在这项工作中,我们建议将猎犬和重新划分为依次应用于变压器架构内的硬注视机制,并将所产生的计算表示给读者送入。在这个奇异模型架构中,隐藏的表示从搬运者逐渐改进到Reranker到读者,这更有效地利用模型容量,并且当我们以端到端的方式训练时,还导致更好的梯度流动。我们还提出了一种预先训练的方法,以有效地培训这种架构。我们评估我们的自然问题和TriviaQA Open DataSets的模型以及固定参数预算,我们的模型优于以前的最先进模型1.0和0.7精确匹配分数。
translated by 谷歌翻译